Nested Variational Compression for Deep GPs

James Hensman and Neil D. Lawrence

8th May 2015

In this notebook we give a simple demonstration of the nested variational compression approach to deep Gaussian processes. First we perform some setup and load in the code. The deep GPs are currently released as an augmentation of our GPy software. First of all we import that software and some files for running deep GPs.



In [1]:

    
import GPy 

from coldeep import ColDeep
from coldeep import build_supervised
from layers import *









    



mpi not found

Next some plotting for the notebook.



In [2]:

    
import matplotlib
from matplotlib import pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16,8)
%matplotlib inline

Non Gaussian Derivatives

Gaussian process models have Gaussian distributed derivatives. This means that they tend to struggle with approximating step functions, which have derivates which are either zero or infinite. Duvenaud et al showed that as we increase the number of layers in the model the distribution over derivatives can become more heavy tailed. Let's examine this in practice by fitting the model to a step function.



In [3]:

    
np.random.seed(0)
n = 30 # number of data
d = 1  # number of dimensions
# dependent variable is linearly spaced.
X = np.linspace(0,1,n)[:,None]
# response variable is step function
Y = np.where(X>0.5, 1,0) + np.random.randn(n,1)*0.02
# where to plot the model predictions
Xtest = np.linspace(-1,2,500)[:,None]

Modelling with a GP

Now we attempt to model the step function with a Gaussian process. Parameters are chosen by type-II maximum likelihood.



In [4]:

    
model0 = GPy.models.GPRegression(X,Y)
model0.optimize('bfgs', max_iters=1000, messages=1)









    



 :0: FutureWarning:IPython widgets are experimental and may change in the future.

We can plot the regression to see if it has managed to fit the data.



In [5]:

    
_ = model0.plot()

and we note that the model is overly smooth and results in a variance that is too high.

Deep GP

Now we will consider a deep Gaussian process. Firstly, we'll set up some model parameters and a helper

We can't plot the direct output of the deep GP, so we'll use Monte Carlo sampling. To aid comparison, we first plot the Monte Carlo samples from the original GP.

One Hidden Layer

For our first experiment we create a deep GP with one hidden layer. The model is easily constructed by creating different layer objects for the deep GP and then concatenating them.



In [6]:

    
model1 = build_supervised(X, Y, Qs=(1,), Ms=(15,15))

Now we optimize the model with the L-BFGS algorithm.



In [7]:

    
model1.optimize('bfgs', max_iters=1000, messages=1)



In [8]:

    
model1.plot(xlim=(-1, 2), Nsamples=3)









    



 layers.py:115: RuntimeWarning:covariance is not positive-semidefinite.

Two Hidden Layers

Next we consider two hidden layers.



In [9]:

    
model2 = build_supervised(X, Y, Qs=(1,1), Ms=(15,15,15))
model2.optimize('bfgs', max_iters=1000, messages=1)



In [10]:

    
model2.plot((-1, 2),3)

Three Hidden Layers

Finally we consider three hidden layers.



In [11]:

    
model3 = build_supervised(X, Y, (2,2,2), (15, 15, 15, 15))
model3.optimize('bfgs', messages=1,max_iters=5000)



In [12]:

    
model3.plot((-1, 2),3)

Between the Layers

We can also explore what's going on between layers by plotting each of the Gaussian processes. The plots show how the mapping function looks and how the inducing variables propagate through.



In [13]:

    
for layer in model2.layers:
    layer.plot()

Example on Robot Wireless Data



In [ ]:

    
import pods
data = pods.datasets.robot_wireless()
data['X'].shape
Y = data['Y']
n = Y.shape[0]
t = np.linspace(0, n-1, n)[:, None]









    



Acquiring resource: robot_wireless

Details of data: 
Data created by Brian Ferris and Dieter Fox. Consists of WiFi access point strengths taken during a circuit of the Paul Allen building at the University of Washington.

Please cite:
WiFi-SLAM using Gaussian Process Latent Variable Models by Brian Ferris, Dieter Fox and Neil Lawrence in IJCAI'07 Proceedings pages 2480-2485. Data used in A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models by Neil D. Lawrence, JMLR 13 pg 1609--1638, 2012.

After downloading the data will take up 284390 bytes of space.

Data will be stored in /home/james/ods_data_cache/robot_wireless.

Do you wish to proceed with the download? [yes/no]



In [ ]:

    
model = build_supervised(t, Y, (2,15), (40, 40, 40))
model.optimize(messages=True)



In [ ]: